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Abstract — The penalty incurred by imposing a finite delay 
constraint in lossless source coding of a memoryless source is 
investigated. It is well known that for the so-called block-to- 
variable and variable-to-variable codes, the redundancy decays 
at best polynomially with the delay, which in this case is identified 
with the block or maximal phrase length, respectively. In stark 
contrast, it is shown that the redundancy can be made to 
decay exponentially with the delay constraint. The corresponding 
redundancy-delay exponent is shown to be bounded from below 
by the Renyi entropy of order 2 of the source, and from above 
(for almost all sources) in terms of the minimal source symbol 
probability and the alphabet size. 

I. Introduction 

It is well known that any memoryless source can be asymp- 
totically losslessly compressed to its entropy [IJ. However, in 
the presence of resource constraints, a rate penalty, referred to 
as redundancy, is unavoidable. In this work we focus on the 
redundancy in the encoding of a memoryless source incurred 
by the imposition of a strict end-to-end delay constraint d, i.e., 
under the requkement that n-th encoded symbol must always 
be perfectly reproduced at the decoder by time n + d. 

Traditionally, lossless source coding is divided into three 
classes: 1) Block-to- Variable (BV) codes (e.g. Huffman code), 
where a fixed block of source symbols is encoded into a 
variable length codeword, 2) Variable-to-Block (VB) codes 
(e.g. Tunstall code), where the source sequence is parsed 
according to a code-tree, and each phrase is encoded into a 
fixed length codeword, and 3) Variable-to- Variable (VV) codes 
(e.g., KJiodak codes), where the source sequence is parsed and 
each phrase is encoded into a variable length codeword. In the 
BV regime, a delay constraint is usually interpreted as a block 
length constraint, and the redundancy is known to decay at 
best polynomially with the delay 121 Q . In the VB/VV regime 
(where the delay is a random variable depending on the source 
sequence) the delay constraint is translated into a maximal 
phrase length constraint, and the redundancy again decays at 
best polynomially with the delay, though sometimes faster than 
in the BV case |4||5Q. 

In a delay constrained setting, the traditional framework 
above admits two (related) limitations. First, even within 
that framework, there is an apparent disparity between delay 
and block/phrase length. The reason block/phrase lengths are 
identified with delay in the first place is since a repeated use of 
the same code allows the source reproduction at block/phrase 
length intervals. However, the delay can sometimes be sig- 
nificantly shorter, for essentially the same reason: Consider a 
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' These results hold even in the weaker case of an expected delay constraint 



BV code of block length n = kd obtained by concatenating 
k short BV codes of block length d. Clearly, the decoder can 
reproduce symbols with a delay d, rather than the much larger 
delay n. Waiting until the end of the block would mean the 
encoder is "holding back" bits it is already certain of, clearly 
an undesirable trait in a delay constrained setting. Of course, 
the redundancy associated with such an encoder still decays 
polynomially with d, which brings us to the second limitation. 
In the traditional setting, the encoder never looks beyond the 
end of the current block/phraes, in the sense that the source's 
prefix has no effect on the output of the encoder beyond that 
point. The encoder is therefore being "reset" roughly every d 
symbols. Loosely speaking, the penalty incurred by forcing 
these regularly recurring reset points, is the source of the 
polynomial delay of the redundancy. 

With these observations in mind, we recall a lossless coding 
technique of a different flavor that does not suffer from 
the above shortcomings. In arithmetic coding ||6l, a source 
sequence is sequentially mapped into nested subintervals of 
the unit interval, with length equal to the sequence probability, 
and the common most significant bits of the current subinterval 
are emitted. This way, the encoder never holds back any 
bits it is already certain of, by definition. Moreover, whereas 
BV/VBA'V encoders never look beyond the end of the current 
block/phrase, an arithmetic encoder always looks into the 
(possibly infinite) future. Unfortunately, this comes at a cost 
of an unbounded delay (though a bounded expected delay, see 
fTl, fsl, |9|). Nevertheless, the notion of arithmetic coding 
does point us in the right direction. In a delay constrained 
framework, an encoder should by definition be sequential, 
emitting all the bits it can at any given instance. Moreover, 
a good delay constrained encoder should always strive to look 
d steps ahead, avoiding "reset" points as much as possible. 
As we shall see, these properties are nicely captured within 
an interval mapping type framework. 

In this paper, we introduce a general framework for lossless 
delay constrained coding of a memoryless source, and study 
the fundamental tradeoff between delay and redundancy. We 
show that, in stark contrast to the polynomial decay within 
the traditional framework, the redundancy 5H(P, d) associated 
with a memoryless source P over a finite alphabet X, can be 
made to decay exponentially with the delay d. Specifically, we 
show thajl 

(iff) 

where PmimPmax are the minimal and maximal source sym- 
bol probabilities, and the lower bound holds for almost all 

'By g fed we mean liminfd_>oo ^ '°§ ^ > 



source^. We then tighten the upper bound and obtain 

where H2{P) is the order 2 Renyi entropy of the source. 
For our upper bound, we introduce a construction based on 
mismatched arithmetic coding in conjunction with a fictitious 
symbol insertion mechanism. For our lower bound, we provide 
a useful "generalized interval mapping" representation for 
delay constrained encoders. 

The paper is organized as follows. Our framework is intro- 
duced in Section [III and some basic lemmas are derived. In 
Section HUl the delay profile of mismatched arithmetic coding 
is analyzed. This analysis is then applied in Section ITV] where 
a lower bound on the redundancy-delay exponent is derived. In 
Section [V] a corresponding upper bound on the redundancy- 
delay exponent for almost all sources is presented. Some final 
remarks are given in Section [Vll 



II. Preliminaries 



A. Notations 



We write s ^ t io indicate that a string s is a prefix of 
a string t, and s ^ f to indicate that s < t and s ^ t. The 
Lebesgue measure of a set A C M is denoted by \A\. The 

def 

fractional part of a number a G M is denoted by (a) = a — [aj . 
The difference modulo-1 {A — B) between two sets A, S C M 
is the set of all numbers (a — h) where a e ^ , 6 G -B. For any 
function / : M iH> R and any set A C M, we write f{A) for 
the image of A under /. All logarithms are taken to the base 
of 2. A total order of a finite set is called simply an order. 

The following lemma is easily verified. 

Lemma 1: Let A, S C M be any two sets. Then 

(i) If 6 e S and (c) <^ {A - B), then b + c^A. 

(ii) IfbeB and (lege) ^ (log A - logB), then be ^ A. 

B. Sources 

Let X he a finite alphabet of source symbols. The set of all 
length-n strings of symbols from X is denoted X"^, the set of 
all finite length strings is denoted X*, and the set of all infinite 
length strings is denoted X°°. We sometime use the notations 
= X1X2 ■ ■ ■ Xn and x^ = XmXm+i ■ ■ ■ Xn foT finite source 
strings, where the convention is that x^ = when ni > 
n. A discrete memoryless source (DMS) P is defined by a 
probability mass function (p.m.f.) {P{x) : x G X} which 
naturally induces a product measure over X* , via P{st) = 
P{s)P{t) for all s,t E X* , where st is the concatenation of 
s and t. Specifically, we denote by P" the p.m.f. obtained by 
restricting P to A"". An infinite random source string emitted 
by the source P will be denoted by X°°. The entropy of the 
source is denoted H{P). The kullback-Leibler distance, or 
divergence, between two sources P, Q over the same alphabet 
is denoted D{P\\Q). We write P < Q if Q{x) = implies 
P{x) = for all X E X. The set of all p.m.f.'s over X 
is denoted ,'3^{X). The type of a sequence a;" G X^ is the 
p.m.f. Pxr^ G ^{X) corresponding to the relative frequency 

'Note that such a lower bound cannot hold for all sources, since dyadic 
sources can attain zero redundancy with zero delay. 



of symbols in x". The set of all possible types of sequences a;" 
is denoted ^"■{X). The type class of any type Q G .^''(X) 

Hpf 

is the set Tq = {a;" G A"" : P^^ = Q}. For e > 0, let 
^^{X,P) C ^"(A") be the subset of all types Q for which 

IIP-QIU <£. 
The following facts are well known lfT2l . 
Lemma 2: For any type Q G 3^"'{X) and any a;" G Tq: 

(i) P(a:") = 2-"(^('3ll^)+^(^)). 

(ii) |^"(A')|-i2"^('3) < \Tq\ < 2"(^('3). 

(iii) |^"(A')| - ("fifi;') < (n + 1)1-^1. 

(iv) (AEP) For any e > 0, 



lim P M Tq \ =1 

The Renyi entropy 1 1 3 1 of order a of a source P is 

Lemma 3 (From [14^): The Renyi entropy of order a > 1 
admits the following variational characterization: 



Ha{P) = min 



D{Q\\P)+H{Q) 



Qe.^(X) L a - 1 

For < a < 1, replace the min with a max. 

For any two sources P, Q over the same alphabet X, we 
define 

^n^xdef P{x) 

v{P, Q) = sup -— 

xeX:P(x)>0 

The following is easy to verify. 

Lemma 4: v{P, Q) > 1 with equality if and only if P = Q. 

C. Encoders 

An encoder is a mapping £ : X* ^ {0, 1}* such that 
for any s E X* , £{s) is the longest common prefix of the 
bit strings {£{sx) : x G X}. Namely, we are assuming the 
encoder does not withhold any bits, at any given time it will 
have emitted the longest prefix it was certain about. This will 
be referred to as the integrity property. Note that the integrity 
property impUes in particular the consistency property, namely 
that £{s) ^ £{sx). 

An encoder £ is associated with a delay function, which 
returns the minimal number of symbols from a given (infinite) 
suffix that needs to be encoded so that a given prefix can 
be fully decoded. Formally, the delay function is a mapping 
5^ : X* X X°^ ^ N U {00}, where d^{s, x°°) is the minimal 
fc G N U {0} such that £{sx'') ^ £{t) impHes that s ^ t, for 
any t G X* . If no such k exists, then 6^{s, x°°) =^ 00. 

The delay profile associated with an encoder £ and a source 
P for a given prefix s, is the following extended-real-valued 
r.v.: 

A^(s,P) = 

The delay profile associated with an encoder £ and a source 
P is then defined to be 

A^(P) = sup A^(s,P) 

seX' 

Next, we define several families of encoders. 



1) Lossless Encoders: An encoder is said to be lossless 
w.r.t. P (where P is omitted when there is no confusion), if 

P(A^(P) < oo) = 1, 

The family of all encoders that are lossless w.rt. P is denoted 
£(P). 

2) Bounded Expected Delay Encoders: An encoder is said 
to admit a bounded expected delay w.r.t. P (where P is omitted 
when there is no confusion), if 

E(A^(P)) < oo 

The family of all encoders with bounded expected delay w.rt. 
P is denoted Q3(F). Clearly, !B(P) C £(P). 

3} Delay Constrained Encoders: An encoder is said to be 
delay-constrained, if 

sup 5^{s,t) < oo (1) 

More specifically, such an encoder is also said to be d-delay- 
constrained, if the supremum above equals d. The family of 
d-constrained encoders is denoted by £^0 Clearly, €d C *B(P) 
for any source P. 

4) Phrase/Block Constrained Encoders: An encoder is said 
to be phrase-constrained, if for any x°° G there exists 
a d e N and an index sequence {ik £ I^lfc^i such that < 
ifc+i — jfc < d + 1, and 

S^ix^\xr,^,)=0 (2) 

In this case we also say the encoder is d-phrase-constrained. In 
the special case where ik — {d+l)k for all x°° G X°", we say 
the encoder is d-block-constrained The family of all d-phrase- 
constrained (resp. d-block-constrained) encoders is denoted by 
<Lf'^^ (resp. (L'f°^''). Clearly, C^'"^"^ ^ g^phra^e ^ 

Remark 1: Any encoder £ G (resp. £ G cp^^^^^^) 

corresponds to a (possible time-varying) concatenation of BV 
(resp. VV) codes with block length (resp. maximal phrase 
length) d + 1. 

5} Interval-Mapping Encoders: A binary string hf' G 
{0, 1}'^' is said to represent a binary interval 

[6'^-) = [O.6162, . . . 6fcO, 0.6162, . . . 6fcl) C [0, 1) 

For any set A C [0, 1) we write bin(A) to denote the minimal 
binary interval containing A, i.e., 

bin(A) Pi [6) 

6G{04}':AC[6) 

The following lemma is easily observed. 
Lemma 5: For any 6, c G {0, 1}*, 

(i) 6 ^ c ^ [c) C [6). 

(ii) 6 2< c and c ^ 6 ^ [6) n [c) = 0. 

def 

Let 6 = {[ a, 6) I < a < 6 < 1}. An encoder £ is said 
to be an interval-mapping encoder, if there exists a mapping 
: X* ^ &, i.e., a mapping of finite source sequences 
into subintervals of the unit interval, such that the following 
properties are satisfied 

*Note that growing dictionary encoders sucli as the LZ encoder 1151 do not 
belong to this family, as their delay grows unbounded. 



(i) Minimality: [£{s)) — bin(l^(s)) for any s G X* . 

(ii) Disjoint nesting: For all s & X* and all distinct x,y ^ X, 

I^{sx) CI^{s), I^{sx)r\I^{sy) = (!) 

The minimality property means that an interval-mapping en- 
coder emits the bit sequence representing the minimal binary 
interval containing the interval (s). It is easily observed that 
the minimality and disjoint nesting properties together imply 
the integrity property. The family of interval mapping encoders 
is denoted by 3. 

Let < be any order of X. A special case of an interval- 
mapping encoder is an arithmetic encoder w.rt. the order < 
matched to a source P, which is defined as follows: 

Mx) Y.p(y) 

y<x 

fnixn = /„-i(a;"-i) + /i(a;„)P(a;"~i) 
- [/„(a;"),/„(a;")+P(a;")) 

We omit the reference to a specific order < when there is no 
confusion, or when the statement holds for any order 

6) Generalized Interval-Mapping Encoders: Let (5* be the 
set of all finite disjoint unions of subintervals from &. An 
encoder £ is said to be a generalized interval-mapping encoder 
if there exists a mapping 2^ : X* ^ &* satisfying the 
minimality and disjoint nesting properties above. The family 
of generalized interval-mapping encoders is denoted by 3*. 
Clearly, 3 cT. 

The following lemma shows that any d-delay-constrained 
encoder admits a generalized interval-mapping representation. 

Lemma 6: Let £ ^ ltd- Then £ can be represented as a 
generalized interval-mapping encoder with 

I'is)^ U l^i'^')) (3) 

Hence, C 3*. 

Proof: See the Appendix. ■ 
Remark 2: The representation in (|3]l is a finite union of 
(possibly overlapping) binary intervals. It is worth noting that 
an arithmetic encoder matched to a source cannot generally be 
written that way, as some of its intervals may only be written 
as an infinite union of binary intervals. This sits well with the 
fact that generally, an arithmetic encoder has an unbounded 
delay. 

D. Redundancy 

The (per symbol) expected codelength at time n associated 
with an encoder £ and a memoryless source P is 

Lf,(P) = n-iE|£(X")| (4) 

where X" ^ P". The (per symbol) expected redundancy at 
time n associated with an encoder £ and a memoryless source 
P is the gap between the expected codelength and the entropy 
after n symbols have been encoded, i.e., 

y\UP)=Li-H(^P) 



The corresponding sup -redundancy and inf- redundancy are 
defined as 



,r^. def ,. 



def , 



d\ [P) ^ lim sup mf, (P) , (P) = lim inf fH^ (P) 

Let us define some useful quantities specific to generalized 
interval-mapping encoders, which will enable us to bound their 
redundancy in relatively simpler terms. A generalized interval- 
mapping encoder £ induces a measure over X", defined by 

and a conditional induced measure, defined as 



def f^n+ki^"^ ) 



Define: 



and let 



r4x-) = D (P^W^i^M^-)) 



be the d-instantaneous redundancy. 

Remark 3: Note that /i^ and /if(-|a;") are not necessarily 
probability distributions, as they may sum to less than unity. 
However, for that exact same reason it still holds that P^(P) > 
0,rd(a;") >0. 

The next lemma relates the interval-based notions of re- 
dundancy defined above, to the actual operational definition 
of redundancy of the associated generalized interval-mapping 
encoders. This correspondence will allow us to think of 
intervals instead of bits, and will play a central role in the 
sequel. 

Lemma 7: The following relations hold: 

(i) For any £ e3*, 

KiP) < Ri{p) 

(ii) For any f e C^, there exists a generalized interval- 
mapping representation (e.g., the one in Lemma |6]l 
such that 



KiP) > 



—) RiUP) + -H{P) 

n J n 



1 



2^ (P) = lim inf V E(r<i(X'')) 
n->oo nd ^ — ' 
fc=l 

Proof: See the Appendix. ■ 
One would naturally be interested in the redundancy per- 
formance that can be guaranteed by employing encoders of 
different classes. In general, the expected redundancy 5Hfj of 
an encoder £ can be negative for some, or even all n. However, 
the sup and inf-redundancy are nonnegative for all lossless 
encoders, and bounds in the c?-block/phrase constrained cases 
are known. 

Lemma 8: The following statements holcH 



^Recall that /(d) 
/(d) = C(g(d)) ^ 



0{g{d)) 



limsup^_ 



3(d) 



< oo, and 



(i) For any source P 

inf M^(P) = inf M^(P) = inf 91^ (P) 

£g£(P) £e<SiP) £€S,{P)~ 

= inf 9^^(P) = 

(ii) (From U^, ^) For any source 

inf M^(P) = 0{d-^) , inf ^\p) = 0{d~i) 

(iii) {From 121?, fi^) For almost all sources, 

inf ^'{P) = n{d-') 

inf 9i^(P) = 

where e > 0. 

We see that employing block/phrase-constrained codes for 
compression under a strict delay constraint, the redundancy 
decays at best polynomially with the delay constrainj§. The 
main contribution of this paper is to show that in fact, the re- 
dundancy can be made to decay exponentially with the delay, if 
the more general family of delay-constrained encoders is used. 
This reveals a fundamental difference between block/phrase 
length and delay in lossless source coding. 

The following lemma shows that for an optimal d-delay- 
constrained encoder, the inf-redundancy and sup-redundancy 
coincide. 

Lemma 9: For any source P, 

inf Df(P) = inf (P) =D\(P,d) 
eaCd £e€a~ 

Proof: See the Appendix. ■ 
Accordingly, 1H(P, d) defined above is called the redundancy- 
delay function associated with the source P. The correspond- 
ing inf -redundancy-delay and sup-redundancy-delay exponents 
associated with P can now be defined: 

E{P) = lim sup logfR(P,d) 

E{P) = lim inf -3 logfR(P,d) 

Our main goal in this paper is to characterize 9l(P, d), E{P) 
and E{P)- 

III. The Delay Profile of Arithmetic Coding 

Consider a case where a source is encoded by a mis- 
matched arithmetic encoder, namely where the encoder's in- 
terval lengths match a different source (see Subsection III-Cl l. 
In the next theorem we upper bound the probability that the 
corresponding delay profile exceeds a given threshold. This 
result will serve as a tool in the next section, where we lower 
bound the redundancy-delay exponent. 

Theorem 1: Suppose a source P G ^^{X) is encoded using 
an arithmetic encoder £ matched to a source Q e 3^{X). Then 

v(p,g)^ 



(A^(P)>d) <2p^ 



dlog 



+ 2qt,^{v{P,Q)) 



Pin ax 
d 



(5) 



liminfd^oo \ > 



This is in fact true even under the weaker expected delay constraint. 



where k = log w 1.4139 . . . 

Corollary 1: Let E be an arithmetic encoder matched to a 
source Q e £^{X). For any source P e 3^{X), if 

gmax ■ Q) < 1 

then the delay profile bound ^ is exponentially decaying with 
d, hence the expected delay is finite, i.e., £ G "^{P)- This 
specifically holds for all non-deterministic P ^ Q. 

Corollary 2: Suppose the source P is encoded using the 
arithmetic encoder matched to the source. Then 

P(A^(P) >d)< 2pi^, (dlog (iMnax) 

Remark 4: An exponential bound on the delay's tail distri- 
bution for matched arithmetic coding was originally observed 
in lfT6]| ||8]|. However, that bound depends on both Pmin and 
Pmax, and can therefore be arbitrarily loose. A bound depend- 
ing only on Pnmx was originally obtained by the authors in |9|, 
where it is also shown how the proof of |,,16J [SJ can be tweaked 
to remove the dependency on Pmin- The bound obtained here 
is tighter than both. 

Remark 5: The bound in Theorem [T| can be further tight- 
ened by observing that specific orders of the alphabet X are 
better than others in terms of the bounding technique used 
here. We do not pursue this direction, since we need an order- 
independent bound in the sequel. 

A. Proof Outline 

Recall the definitions of an interval-mapping encoder and of 
an arithmetic encoder in particular, given in Subsection III-CI 
At time n, the sequence has been encoded into (a:"), and 
the decoder is so far aware only of the interval bin(l^(a;")), 
namely the minimal binary interval containing X^(a;"). Thus 
the decoder is able to decode a;™, where m is maximal such 
that bin(l^(a;")) C I^[x™'). Of course, m < n where the 
inequality is generally strict. After d more source letters are 
fed to the encoder, is encoded into I^(a;"+'^), and the 

entire sequence x" can be decoded at time n + d if and only 

bin(l^(a;"+'^)) CI^(a;"). (6) 

Now, consider the midpoint of bin(l'^(a;")) which by the 
minimality property (see Subsection Ill-Cb is always contained 
in I^{x'^). If that midpoint is contained in {x'^^'^) (but 
not as a left edge), then condition (|6]l cannot be satisfied; In 
fact, in this case the encoder cannot yield even one further bit. 
This observation can be generalized to a set of points which, 
if contained in I^(a;"+'^), x" cannot be completely decoded. 
For each of these points the encoder outputs a number of bits 
which may enable the decoder to produce source symbols, but 
not enough to fully decode x". The encoding and decoding 
delays are therefore treated here simultaneously, rather than 
separately as in 18|. 

Remark 6: When P <^ Q there are "holes" in the interval- 
mapping, namely intervals corresponding to symbols where 
Q{x) > but P{x) = 0. In this case, x" can be decoded at 
time n + dif and only if bin(l^(a;"+'')) r\I^{y") = for any 

'Here we are assuming that P <^ Q, see Remai'k|6] 



yV. _i_ fjgjjcg condition (|6]l is necessary and sufficient if 
P <^Q, and only sufficient otherwise. This point is important 
to note since the case where P ^ Q appears in the sequel. 

After having identified the above set of forbidden points, 
we clearly need to analyze the probability of avoiding them 
within the next d instances. Loosely speaking, for an arith- 
metic encoder matched to the source P, the maximal symbol 
probability Pmax represents the "crudest resolution", or the 
"lowest rate" by which we shrink our intervals, hence intu- 
itively dictates our ability to avoid hitting forbidden points. 
Indeed, the probability that the encoder avoids these points 
is roughly pj^j^x- P^i" ^ mismatched encoder, we get a similar 
expression involving p^^axi 'Zmax and v{P, Q) as a measure of 
the mismatch between the encoder and the source. 

B. The Forbidden Points Notion 

We now introduce some notations and prove three lemmas, 
required for the proof of Theorem [T] Let / = [a, h) C [0, 1) 
be some interval, and p some point in that interval. We say 
that p is strictly contained in / if p G (a, 6). We define the 
left-adjacent of p w.rt. / to be 

^/(p) = minjx G [a,p) ; 3fc G Z+, x = p - 2''') 

and the t-left-adjacent of p w.rt. / as 

t 

l^P (p) \£joijo...oijj{p) , (p) p 

Notice that £f \p) — > a monotonically with t. We also define 
the right-adjacent of p w.rt / to be 

rj{p) =^max {x G {p, b) : 3k e Z"*", x ^ p + 2~''] 

and (p) as the t-right-adjacent of p w.r.t. [a, b) similarly, 
where now r|*' (p) b monotonically. For any S < b — a, the 
adjacent S-set of p w.r.t. / is defined as the set of all adjacents 
that are not "too close" to the edges of /: 

Ss{I,p) = {xG [a + S,b-S) : 3<gZ+U{0}, 

Notice that for 5 > p — a this set may contain only right- 
adjacents, for 6 > b — p only left-adjacents, for 5 > it is 
empty, and for (5 = it may be infinite. 

Lemma 10: The size of Ss{I,p) is upper bounded by 

\Ss{I,p)\< I + 2\og^-^ (7) 

For an interval /, let r7i(/) denote the midpoint of bin(/). 
Note that m{I) G /, by definition of bin(/) as the minimal 
binary interval containing /. In what follows, we will be 
specifically interested in the adjacent S-set of m{I) w.rt. /. 
We therefore suppress the dependence on m(/) and write 

Ssil)'^ SsiLm{I)) 

In particular, the set Sq{I) will be referred to as the forbidden 
points of I. The forbidden points play a central role in the 
sequel, for the following reason: 



Lemma 11: Condition ^ is satisfied if and only if 
(x^+'^) does not contain forbidden points of i.e., 

I^(x"+'*)n5o(I^(a;")) =0 

Proof: Write m — m{X^{x^)) for short. As already 
discussed, if m is strictly contained in then (|6]l is 

not satisfied. Otherwise, assume 

m. Clearly, if I^(a;"+'^) C [^(m),m), then bin(X^(x"+'')) C 
[£(m),m) as well, hence ^ is satisfied. However, if ^{m) is 
strictly contained in I^(a;"+^) then bin(l^(a;"+'')) must be 
the left half of bin(l^(a;")), which by minimality cannot be 
a subinterval of I^(a;"), hence ^ is not satisfied. The same 
rationale also applies to r{m). The lemma follows by iterating 
the argument. ■ 

C. Proof of Theorem Q] 

The probability that the delay (x" , P) is larger than d 
is equal to (or upper bounded by, when P Q, see Remark 
|6l) the probability that ^ is not satisfied. By Lemma [TT] this 
in turn equals the probability that contains none of 

the forbidden points of I^(a;"). To get a handle on this latter 
probability, the following lemma is found useful. 

Lemma 12: Suppose a source P is encoded using an arith- 
metic encoder £ matched to a source Q, and let Pmax, 'Zmax 
be the corresponding maximal symbol probabilities. Then for 
any a 6 



d 

max 



and for any interval J G Z^(a;") sharing an endpoint with 

X^(a;"), 

Proof: The set {I^(a;"y'') : y'^ G A'''} is a partition of 
{x") into intervals, and a belongs to a single interval in the 
partition. Therefore, 

P(aeZ^(X"+'')|X" = .i;") 

< max P(X„"+f = = x") = pi,^ (8) 

establishing the first assertion. For the second assertion, write: 

P(Jnl^(X"+^) 7^0|X" = < Piy"^) 

< E Qiy')-HP,Q)r 

y'^:JnX^{x"y'^)^l!l 

where we have used the fact that max^d = (Z^ax- 

■ 

Write 5*5 = Ss{I^{x")) for short. Note that Ss C 5*0, 
and that So\Ss is contained in two intervals of length 5 both 



sharing an edge with X^(a;"). For any 5 > 0, the delay's tail 
probabihty is bounded as follows: 

P(A^(x",F) > d) 



(a) 

< 

(b) 



(bin(Z^(X"+'^)) = x") 



(c) 



< P {{So\Ss) n ^cp\X'' = x" 



(d) 

< 2 



(o) 

< 2 



-Pmax 

\l£{x^)\ 



^max 



1 + 2 log 



\x^{x-)y 



(10) 



The transitions are justified as follows: 

(a) Condition (|6]l is sufficient, see discussion in Subsection 
IIII-AI In most cases this would be an equality, as condition 
(|6]l would be also necessary, see Remark |6l 

(b) Lemma [TTI 

(c) Union bound over 5*0 = U [Sq \Ss). 

(d) Lemma [121 together with a union bound over the finite 
number of elements in Sq\Ss- 

Taking the derivative of the right-hand- side of (fTOl i w.rt. 5 we 



find that 5 



minimizes the bound. 
Substituting into ( [TO]^ and noting that the bound is independent 
of x", © is proved 

IV. A Lower Bound for E_{P) 

In this section we use the delay's probability tail distribution 
mentioned in the previous section, to derive an upper bound 
for the redundancy-delay function, via a specific arithmetic 
coding scheme. We emphasize that unlike |17|, the presented 
scheme is error free, hence there is zero probability of buffer 
overflow. Moreover, our figure of merit is the delay in source 
symbols vs. the redundancy in bits per symbol. 

A. A Finite Delay Result 

Theorem 2: The redundancy-delay function for a source P 
is upper bounded by 

$R(P, d) < 2p^-JP^-^-^ ((d - cb„,ax)) log (2M„ax) + 1 + k) ' 

(11) 

where 



c{x) = 







2 [log (2/2;) 



X < 
1 o.w. 



16 



Corollary 3: The inf-redundancy-delay exponent for a 
source P is lower bounded by 

E{P) > log(l/p 

max J 

^Observe that (To) holds even if S > \I^(x")\, in which case our bound 
becomes trivial. 



Proof: Let us first describe the high-level idea behind 
the proof. We extend the source's alphabet by adding two 
fictitious symbols, and then encode the source using a slightly 
mismatched arithmetic encoder. The encoder keeps track of 
the decoding delay, and whenever the delay reaches d + 1, it 
inserts a fictitious symbol that nullifies the delay. There are 
three key points: 1) There exists a mapping such that there is 
always at least one fictitious symbol whose interval contains 
no forbidden points, 2) The length assigned to the fictitious 
symbols can be made very small, and 3) The probability of 
insertion, bounded via Theorem [T] is also very small. 
For any interval / = [a, fe), let 

ipi{\) = (1 - X)a + Xh 
and define define the two disjoint subintervals 

II = [vi (3/8) , VI (1/2)) , In = {^i (1/2) , (5/8)) 

The first key point is established in the following Lemma. 

Lemma 13: For any interval / C [0, 1), either 1^ nSo{I) ~ 
0or Jfln5o(/) =0. 

Proof of Lemma 175} Write m — m (I^(x")) for short. 
Without loss of generality, assume that m < (pi{l/2). There 
are two cases: 

(1) m < ipi{3/8): It is easily verified that the right adjacent 
of m satisfies r(m) > ipj{l/2), as otherwise 

m + 2{r{m) — m) G / 

contradicting the maximality in the definition of the right 
adjacent. Therefore in this case II contains no forbidden 
points of I. 

(2) m > 93/(3/8): By our assumption m < (y3„(l/2), hence 

^ ipijl) - ipi{l/2) 
rim) — TO > 

Rewriting, we have 

,(^)>„, + MlL^£(V^>^,(5/8) 
and therefore 1^ contains no forbidden points. 

■ 

Returning to the proof of Theorem |2] define an extended 
alphabet A'+ = A" U {xl, x^} where xltXr are Vwo fictitious 
symbols. Let P+ G ^(A'+) be the corresponding extension of 
the source P to , assigning zero probability to the fictitious 
symbols. For < e < Pmax, let G ^{X^) be a source 
with the following symbol probabihties: 



(1 - 2e)P{x) xe X 

e X e {xl, xr} 



Clearly, maxP+(a;) = (1 -2e)p„ax < JE ™d i^(P+,P+) 



Let < be any order of X. Since P^{x) < 



for 



all x G X^, and since by Lemma [T3] = \Ib.\ = \I\/ 
8, then it is easy to see there exists a order <+ of X^ 
that preserves < over X, such that the arithmetic encoder £ 
w.r.t. <+ matched to P+ has the fictitious symbols X]^,xr 
mapped into intervals contained in I^{x")]^ and X^{x")fi, 
respectively. If the condition on Pmax is not satisfied, then 
we can always aggregate a few symbols into a super-symbol. 



so that the maximal product probability satisfies the required 
condition (the effect of this aggregation on the delay is treated 
later on). To encode the source P+, let us now use the 
arithmetic encoder for P+ above together with the following 
fictitious symbol insertion algorithm: The encoder keeps track 
of the decoding delay by emulating the decoder Whenever 
this delay reaches d + 1, the encoder finds which one of 
1^(2;" )l or 1-^{x'^)n, contains no forbidden point, and inserts 
the corresponding fictitious symbol xl or xr respectively, 
hence nullifying the decoding delay. This way, the decoding 
delay never exceeds d and no errors are incurred. 

We now bound the redundancy incurred by the encoder 
£' G £d described above. There are two different sources of 
redundancy. The first is due to the mismatch between P+ 
and P+, and the second is due to the coding of the inserted 
fictitious symbol. At each time k > d, the probability Wk for 
an insertion can be bounded via Theorem [l] 

.£\X^-'^,P) >d)< P(A^'(P) > d) 
<2pi^^(d\og( ^ 



+ 2(1 

^max 



(1 - 2£)pniax 

2erpt..{l 26)-" 
dlog ' 



, (1 - 2£)p,„, 

Now, let P+" be the n-product of P+, and write 

<(P) = <(P+) < Pf (P+) = li?(P+«||^f ) 

n 

- yE(D{p+\\f4'{-\x''-' 



(12) 



k=\ 



(c) 



i?(P+||P+) + -log-^ 



fe=l 



(d) 

< 2 log 



^max 



dice 



(1 - 2£)pn 



Ids 



1 - 2e 



max 



2d log 



1 



-4e 



^ Pmax ' 

The transitions are justified as follows: 

(a) Lemma I2] 

(b) The chain rule for the divergence, and the fact that P+" 
is a product (memoryless) distribution. 

(c) Given X^^^ , \i\ follows P+ with an extra multiplication 
by £ if and only if X^^^ is such that there is an 
insertion. Hence the the expected divergence given X^^^ 
always yields the term _D(P+||P+), and an extra logl/£ 
multiplied by the probability of an insertion Wk- 

(d) The bound for Wk given in (O, and P)(P+||P+) = 

(e) log Y32i < 4£ for < £ < j^. 
Setting £ = p'l^^^, we get: 

9lf (P) < 2pf„,i(ilog (^—] + K, + 1) dlog + 4pt. 



<2pS.axUlog 



Pmax 

2 

Pmax 



1 



(13) 



Finally, we address the case where p,„ax > i^- men- 
tioned before, we aggregate a minimal number of source 
symbols k into a super-symbol, such that p^^^ < jq. This 
means that 1 < fc < ^ ^^^ — J . We now carry out the above 
procedure for the fc-product alphabet. However, since decoding 
is performed fc symbols at a time, we set our delay threshold 
to be d = [^x^ ^ ij ■ Substituting the above into ( fT3] l we get 

^nf (P) < 2p£, (dlog(2/pLj + K + 1 ' 



{{d - C(pmax)) log (2/pmax) + K + I) 



Remark 7: The scheme described above also allows the 
encoder to change the delay constraint on the fly, by inserting 
a suitable fictitious symbol in accordance to the modified 
constraint. Once the decoder is made aware of this change, 
both encoder and decoder need to simultaneously adjust the 
probability of the fictitious symbols. 

B. An Asymptotic Result 

Theorem 3: The inf-redundancy-delay exponent for a 
source P is lower bounded by 

E{P) > H2{P) 

Proof of Theorem |5} We construct a unit delay encoder 
for the product source P''- using fictitious symbols in a similar 
way as done in Theorem |2l with an additional random coding 
argument. Let < be a order of X'^ such that all super-symbols 
in the same type class are adjacent (and otherwise arbitrary). 
Let <yd be a new order which is obtained by a rotation of 
the order <, such that y'^ is the smallest element w.rt. <yd, 
and the such that the largest element satisfying z'^ < y'^ is 
the largest element w.rt. <yd. Finally, let be the order 

of X'^^ =^ X'^ U {xltXr} that preserves <yd over X'^, such 

that the arithmetic encoder £ w.rt. <\ matched to Pf has the 

y ^ 
fictitious symbols xl,xr mapped into intervals contained in 

and I^{x^)b., respectively, and are of the minimal 
order satisfying this. 

Let us now draw an i.i.d. sequence (Yj^, Fj'^, . . .) with a 
marginal P'^, independent of the source sequence. At time 
instance fc (where time is now w.r.t. the product source), 
we use an arithmetic encoder w.r.t. the random order <vd, 
and matched to P^. Denote the associated random interval- 
mapping encoder by S. It is easy to see that for any point 
a G X^(a;"''), the probability that the interval corresponding 
to a type Q will include a is upper bounded p^^^x P^^^ the 
probability of the type class Tq under P'^, where by Lemma 
|2] the latter is upper bounded by 2~''^^'^ll^\ By the same 
Lemma, the probability of any super-symbol within the type 
class To is 2-'^(^(QII^)+^(Q)). Thus, 



Taking the limit as d — s> oo, and since there is only a 
polynomial number of types, we obtain 

lim -ilogPfa eZ'^(X"(''+i))|X"'' x""^ 

d^oo a ^ 



> inf |D(Q||P) + i/(g) + minLD(g||P),log 



Let V(Q) denote the function over which the infimum above is 
taken, and assume without loss of generality that P is strictly 
nonzero over X. V{Q) is continuous and the infimum is taken 
over a compact set, hence is attained for some Q* e ^{X). 
Suppose that D{Q*\\P) > logl/p,„ax- Let x e X he such 
that P{x) = Pmax, and suppose there exists y <E X such 
that P{y) < Pmax and Q*{y) > 0. Generate a perturbed 
distribution by increasing the probability assigned by 
Q* to X by some /3 > 0, and decreasing the probability 
assigned by Q* to y by the same [3, leaving the other prob- 
abilities unchanged. Clearly, we have D{Q^\\P) + H{Q^) < 
D{Q* \\P)+H{Q*). By continuity, there exists (3 small enough 
such that P'(Q^IIP) > logl/pmax- Hence F(Qt) < v{Q*) 
for such P, contradicting the minimality of Q* . If such y 
does not exist, then P{x) = Pmax over the entire support 
of Q*. Therefore, D{Q*\\P) = logl/p„,ax - H{Q*) < 
logl/pniax, in contradiction to our assumption. We conclude 
that D{Q*\\P) < logl/pn,ax. Hence, 

lim - i logP (a e i<?(x"(''+i))|X"'' = .t"'' 
^ min {2D{Q\\P) + H{Q)}=H2{P) 

where Lemma [3] was invoked in the last equality. Continuing 
this line of argument, we can essentially replace pj^j^x with 
2-dH2{P) fQj. ^ large enough, throughout our proofs. There- 
fore, the redundancy averaged over the ensemble of random 
d-delay constrained encoders is bounded by 



E(5n'^(P)) =0(2 



-dH2{P) 



(15) 



and thus there exists a deterministic encoder £ achieving at 
least that expected performance, concluding the proof. ■ 

V. An Upper Bound for E{P) 

In this section we provide an upper bound for the sup- 
redundancy-delay exponent for almost any memoryless source, 
which is meant w.r.t. the Lebesgue measure over the proba- 
bility simplex. 

Theorem 4: For almost any memoryless source P, the sup- 
redundancy-delay exponent is upper bounded by 



E{P) < 8 log 



\xi 

Pmir 



(16) 



(14) 



Remark 8: Note that ( fT6] l cannot hold for all sources, e.g. 
for 2-adic sources we can have zero redundancy with zero 
delay, hence an infinite exponent. 

Remark 9: When restricted to interval-mapping encoders 
only, a tighter upper bound of Slog (1/pmin) holds. 



A. Proof Outline 

Since the proof is somewhat tedious, we find it instructive 
to provide a rough outHne under the assumption that the 
encoder admits an interval-mapping representation (rather than 
a generalized one). This assumption will be removed in the 
proof itself. Due to the strict delay constraint, at any time 
instance the encoder must map the next d symbols into 
intervals that do not contain any forbidden point^. Typically 
(for almost every interval), we will find an infinite number of 
forbidden points concentrated near the edges, with a typical 
"concentration region" whose size depends on the specific 
interval. Clearly, the distances between consecutive points 
diminishes exponentially to zero. Therefore, mapping symbols 
to the concentration region will result in a significant mismatch 
between the symbol probability and the interval length, and 
this phenomena incurs redundancy. This observation is made 
precise in Lemma [T4l 

Now, loosely speaking, there are two opposing strategies 
the encoder may use when mapping symbols to intervals. 
The first is to think short-range, namely to be as faithful to 
the source as possible by assigning interval lengths closely 
matching symbol probabilities (within the forbidden points 
constraint). This will likely cause the next source interval to 
have a relatively large concentration region, resulting in an 
inevitable redundancy at the subsequent mapping. The second 
strategy is to think long-range, by mapping to intervals with 
a small concentration region. This in general cannot be done 
while still being faithful to the source's distribution, hence 
this strategy also incurs in an inevitable redundancy. The latter 
observation is made precise in Lemma [18] Our lower bound 
results from the tension between these two counterbalancing 
sources of redundancy. 

B. Proof of Theorem |4] 

In Ught of Lemma |6l we can restrict our discussion to 
generalized interval-mapping encoders of the form (O. How- 
ever, we will find it more flexible to consider a broader 
family of generalized interval-mapping encoders, satisfying 
the following conditions: 

(i) For any s S X* , {s) is a union of at most 
intervals 

(ii) For any s ^ X*,x'^ ^ X'^, 2^ {sx"^) contains no forbid- 
den points of any of the intervals comprising l^(s)0 

Let / C [0, 1) be a finite union of disjoint intervals {Ik}^=i- 
Define 

^ 1a-5| 



Ail) U 



k=l 



1/1 



a,6e 5(/fe),(a,6)n5(4) = 



and let 



Si = Si{P,d) = max{a £ A{I) : a < pfnin/4} 

'As mentioned in Remark [6] avoiding forbidden points is not always a 
necessary condition. However, in tiie next section we verify this is not a 
restriction. 

'"To disambiguate the statement, we clarify that any two intervals whose 
union is an interval are counted as a single interval. 

"Note that this is satisfied by since hin(l^ {sx'^)'j is always contained 
in one of the intervals comprising (s). 



Namely, 5i is the maximal distance between two consecutive 
forbidden points in some Ik, normalized by the measure of /, 
that is smaller than p'^^^/A. 
Lemma 14: rd{x") > Sxe^x") 

Proof: Let / = throughout the proof. Let 

z'^ = argmin^f(/|x") 

ydfzyd 

and let 7 fi^{z'^\x"). If 7 < 5/, then z'' has been assigned 
with a measure at least four times smaller than its probability 
P{z''-). The d-instantaneous redundancy can be lower bounded 
as follows: 

(a) (b) 

r,{xn ^ D{P''\\^,,{■\x-)) > DiPi^J^^) > DipiM 

d 1^^ ^min , /i \i ^ /^min 



7 1-7 



(c) 



1 - n'^ . 



Pmin > 



In (a) we have used the data processing inequality for the 
divergenc^H In (b) we have used the fact that 7 < < 
P{z'^) together with the monotonicity of the scalar relative 
entropy. In (c) we have used log(l— p) > for < p < 1. 

If on the other hand •y > Sj, then all of the d-fold alphabet 
has been assigned to a measure at most I — Si which results 
in a d-instantaneous redundancy lower bounded by 

rd{x") > log > Si log e > 5i 

1 - Si 

■ 

A number a £ [0, 1) is called (m, £)-constrained if 
a = 0.00^^ !(/)... 0OO_^ (/)... 

m'(a) rri ^ 

where m'{a) is the length of the zeros prefix of a, and (f) is 
the "don't care" symbol. The (m, £)-constrained region Cm,i 
is the set of all such numbers. A number a G [0, 1) is called 
(to, £)-violating if 



a = 0.00. 



(17) 



e bits, not all '0' or all '1' 



The (m, £)-violating region Vm.e is the set of all such num- 
bers. The complement V,n,e = [0,1) \ Vm is called the 
(to, £)-permissible region. Define the regiono 

LCraJ (- log Cm, , LVraJ =^ (- \ogV m,l) 

and let 

The following two lemmas are easily observed. 

Lemma 15: Let /z > 0. If a £ Vmi and h £ Cm/' where 
£ < £', then 

la - 61 > 2-"'(°) • 2-("+^) > - ■ 2-^™+'^) 
I I - - 2 

'^Recall that /itj(-|a;") sums to at most unity, hence can be complemented 
to a probability distribution by adding an auxiliary symbol oj to X"^ and 
defining P'*(aj) = 0. 

'^The log and (■) operations are taken pointwise on the set elements. 



Lemma 16: If /, J C [0, 1) are each a union of at most M 
intervals of size no larger than r each, then (/ — J) can be 
written as a union of at most AP + 1 intervals of size no larger 
than 2r each. 

The (m, ^)-permissible region within the interval [1/2, 1) is 
comprised of 2™^^ + 1 subintervals. By definition, the size of 
each is upper-bounded by 2^'™ +m+t)+i^ Applying (— log(-)) 
to all such intervals in the [1/2, 1) interval (corresponding to 
m! — 0) will stretch each of them by a factor of at most 
2 log e < 4. All other permissible intervals (those with ml > 
0) coincide on the unit interval after applying the (— log(-)) 
operator Hence LVm^e can be written as a union of at most 
2™^^ + 1 intervals, each of size at most 2^(™+^)+-^. A similar 
argument shows that LVm,i can also be written that wa\F^ 
Appealing to Lemma [16] can be written as a union of at 
most (2™"i+l)2 + l intervals, each of size at most 2-('"+^)+4. 
Applying the Lemma again, we find that V]^'^ can be written 
as a union of at most ((2™-i + l)2 + l)2 + l <' 24™+i intervals 
each of size at most 2^(™+^)+^. Hence, 



(2) 



A source P is called (no, X)-regular if there exists a pair 
of symbols y,z ^ X and toq G N such that for any > fio 



A 



m— mo 



(19) 



Remark 10: £ 2?, 



(2) 



for any m and /i, hence no 



source can be (/io, 0)-regular. Since for a dyadic source A = 
for any pair y. z, a dyadic source is never (/io, A)-regular 

The following two lemmas establish some properties of 
(/lo, A) -regularity. 

Lemma 17: Let /io > 3. Almost any source is (/io,A)- 
regular for some A > 0. 

Proof: Note that C,n,i+i C C,n,t and V,n,i+i D Vm,t., 
hence V^^,,^ C V^^\. By we have that for any /fo > 3 



lim 

mo— >-oo 



u u ^ 



— lim 

mo— >-oo 



11 



m] 



tn—niQ 



< lim V 2"(3"'^«)+^ 

mo— >C30 ^- — ^ 
m— mo 

2mo(3-Mo)+6 

= lim — 

mo->-oo 1 — 2^^~^^'> 



The statement of the lemma follows easily. 
Define the following set: 



Lemma 18: Suppose P is a (/io. A) -regular source. Then 

for any a, /3 > with (3 /a > piQ 

liminfP(Al^)>i 

a— foo Z 

'"'it can in fact be written as a union of less and smaller intervals, but that 
adds nothing to our argument. 



Proof: We will assume hereinafter that e < ^Pmin- Let 
y,z he the symbols attaining A, and define a transformation 
cr : ^"^{X) ^ ^'^{X) on types: 

r Qix) x^{y,z] V g(2/)=0 

(j{Q){x)^l Q{x)-d-^ x^y A Q{y) > 

I Q{x)+d-^ x^z A Q{y)>0 

(20) 

Namely, cr exchanges one appearance of y with the appearance 
of z as long as this is possible, i.e., as long as Q{y) > 0. Now, 
suppose d > iog("/p ) so that (fT9b is satisfied. Noting that 
the set ^ is a union of type classes, let Q £ ^^{X, P) be 
a type such that Tg n ^ = 0. Clearly a{Q) ^ Q, and for 
any a;'' e Tq and x'^ e ^^-(q), 

(-logP(S'^)) = (-logP(a;'^) + A) 



Now since A 2?' 



(2) 



for any m > mo and /i > /io. 
Recalling the 



and since /3/a > /io, then A ^ '^\ad^ 



(18) definition of V) 



'\ad'\ \i3d'\ appealing to Lemma [T] we have 
that (— log P(-e'')) ^ -^[od] [/3d]' hence we conclude that 
cr{Q) 6 A'^ p. Therefore, since a is one-to-one when restricted 
to ^^{X, P), then a uniquely matches any type in ^^{X, P) 
that is outside A'^ ^, to a type that is inside ^. 

Let us now get a handle on the variation in the probability 
of a type class incurred by applying a. It is easy to check that 
for any Q e ^^{X,P), and n large enough. 



P{T,iQ)) > P{Tq) 



{P{y)~e)d \ (P{z) 

Piy) 



{P{z)+e)d 
e 



= P(Ta){l + 0(e) + 0(d-^)) 



Namely, the probability of a type class for a type Q £ 
^^{X, P) under P, remains almost the same after applying 
a. Therefore: 



1 - P{At 



<p\ U E p^Tq) 

P{T.(Q)) 



<o{l)+ E 

P{Tq) 



l + 0(£) + 0(d-i) 



< o 



0(e) + 0(fi-i) 



0(1) 



P{AU) 



l + 0(e) + 0(d-i) 



Where we have used the AEP (Lemma |2]l in the second 
inequality. The result now follows by rearranging the terms 
above, taking the limit as — ^ cxd, and noting that e > can 
be taken to be arbitrarily small. ■ 



From this point forward we assume P is (^o, A)-regular 
with /io > 3. Let /i < /i', and define the indexed sets 

For d Bk, Lemma [T4l impUes that 

rdix'^) > Pfnln (21) 

On the other hand, x'^ ^ B^ impUes that the 
length of each interval comprising {x^) must be in 
C[-diog(i/p,„i„)],rMiog(i/p„i„)]- Since there are at most \X\'^ 
such intervals, it must be that 



\ad'\ , \l3d'\ 



(22) 



where 



a = l0g(l/pmin) + log \X\ , 13 = l0g(l/pmin) " log \X\ 



(23) 



Similarly, if y'' ^ C(x'') then 
where 

/3''=i:Vl0g(l/Pmin)-l0g|^| 

For Lemma [TS] to apply, we set fi, n' such that /3/a > /xq and 
/3'/a > /io- This yields the constraints: 

(mo + 1) log 1^1 



/i > PL> flQ 



log 



In what follows, we will think of fi' as arbitrarily close to fx. 
For any x'' ^ B^ we have: 

E \Piy'')-^^^{y'\x') 

+p':itPic{x>^)) 



> 



E 



P(/)|X^(a;'=)| - \I^{x''y'^)\ 



> 



\l£{x'' 



/ J 9 ^min 



+ P^tP{C{x'^)) 



JcJ 1 

> - 

- 4 

ID 1 r 

> - 

- 4 



{p{Ai^^nc{x''))f+p{Cix')) 



dma,x{2{a+P),fi')+4 



(PiAip) - P{Ai^p n C{x''))f+ P{At^0 n Cix'^)) 

2rf(M+l)log(l/p„i„)+4 



^ /'Pi'/lrf 2d(M+l)log(l/p„,.„)+4 

= rj-^oCn^ 2d(A<+l)log(l/p„i„)+4 
y j^g ^ \ J J "mill 



(24) 



The inequalities are justified as follows: 

(a) Pinsker's inequality for the divergence was used, together 
with Lemma [T4l and the nonnegativity of rd{-)- 

(b) (|22] | and ( |23] ) hold for all the union-of-intervals lengths 



in the summation. Since {— log P{y'^)) ^ Vi^ 



(1) 

[ad] , [Pd] 



for 

each y"^ in the summation, then appealing to Lemma [T] 
we have that P{y'^)\I^ {x'')\ G V^ad],[i3'd]- The inequality 
now follows by virtue of Lemma [15] 

(c) P{A nC)^ P{A) - P{A n C) and P{C) > P{A n C). 

(d) /i' can be taken to be arbitrarily close to /i. 

(e) Lemma [18] was used to lower bound the probability of the 
set Ai^^. 

Combining ( 1211 1 and (i24] i. we get: 

E(rd(X'=)+rd(X'=+'^)) 



> min ( p'^.^, 



16 



\ ) 2d(M+l)log(l/p,„i„)+4 



1 

16 



o(r 



■p 



2d(Ai+l)log(l/p„i„)+4 



This holds for any d-constrained encoder £ G €d, hence and 
plugging into Lemma [7] we get 



mf^iP) 



liminf — E ]E(^d(^') + rdiX''+'')) 
n->oo zna ^ — ' 



fe=i 



> 



1 

16 



-o(l) 



2d 



■P 



2<i(Ai+l)log(l/p„i„)+4 



This lower bound holds for any fi > jiQ + ''\o^(j/p^^'^^ ■ 
Moreover, by Lemma [TT] almost any source is (/io, A)-regular 
for any /io > 3. Therefore, we have that for almost any source 



m^{P) > 
and hence 



1 
16 



o{i: 



1 

2d 



Sdlog 



E{P) < 8 log 



■Pn 



\X\ 



-)+o{d) 



As mentioned in Remark [9] if the encoder is restricted to 
be interval-mapping then a tighter upper bound 81og(l/p,„in) 
holds. In this case {■) is a single interval rather than a union 
of \X\^ intervals, hence the proof remains the same up to the 
substitution \X\ o 1. 

VI. Conclusions 

The redundancy in lossless coding of a memoryless source 
incurred by imposing a strict end-to-end delay constraint was 
analyzed, and shown to decay exponentially with the delay. 
This should be juxtaposed against traditional results in source 
coding, showing a polynomial decay of the redundancy with 
the delay. In the traditional framework, the delay is identified 
with the block length or the maximal phrase length, which in 
our framework imposes a harsh restriction: The decoder is not 
allowed to start reproducing source symbols in the midst of a 
block/phrase, and the delay is repeatedly nullified at the end 



of each block/phrase. This means the encoder is reset at these 
instances, i.e., the prefix has no effect on its future behavior 
Loosely speaking, the gain of exponential versus polynomial 
is reaped via a tighter control over the delay process, making 
such reset events rare. 

Nevertheless, the block/phrase based codes allow the en- 
coder to start-over in roughly constant intervals, and therefore 
such coding scheme are more efficient in a precision limited 
setting. The more general encoders discussed in this paper can 
hence attain their superior performance by allowing a finer 
precision for keeping the encoder's state. However, it should 
be noted that only a finite precision is necessary to attain 
exponentially decaying redundancy, and that precision can be 
easily derived from Lemma [141 Therefore, the redundancy of 
our interval-mapping encoder when operating in a resource 
limited setting is dominated by the larger of two sources: The 
aforementioned delay-precision constraint, and the external 
complexity -precision constraint. 

In our framework, we have isolated the impact of the delay 
on the redundancy by letting the transmission time n go to 
infinity. This also makes sense complexity-wise, since the 
per-symbol encoding complexity is determined primarily by 
the delay, and not by the length of the encoded sequence. In 
practice however, a finite transmission time forces the encoder 
to terminate the codeword, which in turn incurs an additional 
penalty of 0{n~^) in redundancy. Setting d — O(logn) 
renders this additional redundancy term commensurate with 
the redundancy incurred by the delay constraint. Therefore, 
our results imply that the delay can be made logarithmic in the 
block length, while maintaining the same order of redundancy. 
Conversely, for almost all sources this is the best possible 
tradeoff between block length and delay. A similar statement 
in the context of universal source coding was mentioned in 
lITSl . though for a somewhat different definition of the delay. 

There is still a large gap between the lower and upper 
bounds on the redundancy-delay exponent, where the upper 
bound seems particularly loose. Furthermore, it remains to be 
seen whether the zero-measure set of sources for which the 
upper bound may fail to hold, can be reduced from the set of 
sources that do not satisfy our intricate regularity condition, to 
the set of dyadic sources only, which is the smallest possible. 

Appendix 

Proof of Lemma |6} Let us first show that satisfies 
the conditions for a generalized interval-mapping encoder. 

(sx) C (s) is immediate from the consistency property. 
Let y, z G <Y be distinct, and assume that {sy)r\I^ {sz) ^ 0. 
Then since any two binary intervals are either disjoint or 
one is contained in the other, then without loss of generality 
there exist x'^^x''' such that \£{syx'^)^ C \E{szx'^y), i.e., such 
that £{szx'^) ^ £{syx'^). Since < d, it must be that 

sz ^ syx'^ , in contradiction. This verifies the disjoint nesting 
property. 

By the consistency property, {s) C [£{s)). Suppose that 
there exists a binary interval [h) such that I^{s) C [b) C 
[£{3)). Then £{s) ^ b ^ £{sx'^) for any x'^ e A"', and 
hence by the integrity property it must be that b ^ £{s), in 



contradiction. Hence bin(l'^(s)) = [£{s)) for any s G X*, 
verifying the minimality property. ■ 
Proof of Lemma^ An arithmetic encoder matched to the 
source P is well known to achieve zero asymptotic redundancy 
[:6|, and a bounded expected delay Q, lUl, ID. Therefore 

inf M^(P) < inf M^(P)<0 

£e£(P) £e'8(P) 

Let £ £ ii(P)- Define Bd to be the set of all suffixes that 
allow decoding of any prefix with delay at most d, i.e., 

Bd = {y°^ e : <d,ysex*} 

The lossless property implies that for any e > there exists 
d large enough such that 

P{Bd) >l-e (25) 

Define Bd to be the set of all prefixes in Bd, i.e., 

Bd = {z^' eX^:z^^y°-e Bd} 

Note that by the very definition of Bd, each prefix in Bd must 
appear in Bd with all possible suffixes. Therefore, P{Bd) = 
P{Bd) > 1 ~ e for d large enough. Furthermore the lossless 
property also implies that for any z'^ G Bd, the BV codebook 
C^d : A"" h^> {0,1}* defined by 



(26) 



is a prefix-free lossless codebook, and hence must satisfy 
]E|C^d(X")| > nH{P). Write: 

lU<i{P) = ^, E ^(^') E Pixn\S{x-z'')\ 
n + d ^-^ ^-^ 



z''eX'' 



n + d n + d 

Therefore, 

= \hnMmi+d{P) > lim (^-^—^ l) H{P) 



= -eH{P) 



00 V n + (i 



This holds for any e > 0, hence > 0. ■ 
Proof of Lemma^ Let £ E ltd, and set any e > 0. We 
show that there exists another encoder £' E td such that 



m (P) < m"- (P) + e 

which immediately establishes the Lemma. The encoder £' 
will be constructed by properly terminating £. Set n large 
enough such that both 



and 



n > d + mmja, j 



^iiP) <m^{P) + s/4 



(27) 
(28) 



For any x"- ^ e ^, define 

yd^^n-d^ def argniin{ | } 

namely, y'^{x"^'^) is the suffix that results in the minimal 
codelength after having encoded x"^''. Clearly, 



(29) 



Construct the new encoder £' as follows. For any k < n — d, 
let S'ix'') = £{x''), and let S'^x"-"^) = £{x"-'^y'^{x"-'^)). 
For k > n — d, divide x'' into blocks of equal size n — d (with 
the last one possibly shorter), apply the rule above to each 
separately, and let £'{x^) be the concatenation thereof. Using 
jl, we have 



K-AP) -in- d)-'E\£\X^-^)\ H{P) 



(a) n 

< 



-L^JP) - H{P) < - 
d n 



(b) 



< W [P) 



(c) 



-d 



m^iP) 



l<iP) 

■e/4 



-d 



< (P) + e 



where (a) follows from i29i . (b) follows from (I28b . and (c) 
follows from the assumption ( |27] |. Now, from the concatenated 
construction we have that for any m > n — d 

^t{P) < ' ^' -in-d)- mi_^{p) 



< 



m 

m + n — d 



m^{P)+e) 



and hence 



mr (P) = limsup9t^(P) < 25^ (P) 

m— f C30 

as desired. 

Proof of Lemma [7} 

(i) 



<iP) 



i~-H{p) 

E(-log|bin(X^(X"))|) -H{P) 



<-(E(-logM^(X"))-i/(P")) 



- V P(x")log 



R'niP) 



P(x") 



(ii) Consider the generalized interval mapping representation 
of £ given in Lemma |6] This representation satisfies 



Thus similarly to the above: 

miiP) = (- log |bin(l^(X")) I) -i/(P) 

> - fE(-log//(X"+'')) —H(P''+'^' 

n \ ' n + d 

" + ^%f,+,(P) + ^i/(P) 
n 



(iii) For any fixed d G N, 

1 " 



k=l 



( 1 " 

\ fc=i 



fc=i 



< O(n-i) - li{P) - -ElogA.f,+,(X"+''^ 



n + d 



n / '"+'^^n 



HiP) 



<mi + o{n-') 

Similarly, 

1 " 1 

— ^Er<i(X'=) > O(n-i) - H{P) - -Elog^^(X'^ 



fe=i 



Proof of Lemma UOr It is easy to see that the number of 
t-left-adjacents of p that are larger than a + (5 is the number 
of ones in the binary expansion of [p — a) up to resolution 5. 
Similarly, the number of t-right-adjacents of p that are smaller 
than 6 — (5 is the number of ones in the binary expansion of 
(6 — p) up to resolution 5. Defining [a;] == max( \x'\ ,0), we 
get: 

\ss{i,p)\ < [log^r + riog^r 



< 



2 + log 
1 + log ^ 



< l + 21og 



<5 

b-a\ 



5 < p — a,b — p 
o.w. 
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